Buffer optimization for solid-state drives

ABSTRACT

A solid-state drive having an integrated circuit comprising a controller that is configured to determine, for data transferred between a host interface of the integrated circuit and nonvolatile semiconductor storage device interface of the integrated circuit, the availability of an internal buffer of the integrated circuit to transparently accumulate the transferred data, and (i) if the internal buffer is available, accumulate the data from target nonvolatile semiconductor storage devices or the host in the internal buffer, or (ii) if the internal buffer is not available, accumulate the data unit from the target nonvolatile semiconductor storage devices or the host in an external buffer communicatively coupled to the controller, wherein the external buffer is external to the integrated circuit. The controller then provides the accumulated data to the respective interfaces to furnish a read or write request from the host.

CROSS-REFERENCE TO RELATED CASES

This application is a continuation of application Ser. No. 16/836,112filed on Mar. 31, 2020, the entire contents of which is incorporatedherein by reference.

FIELD

The present disclosure relates to solid-state drives and methods thatenable a reduction in DRAM bandwidth usage, a reduction in powerconsumption, and a reduction in latency when processing read and writerequests from a host.

BACKGROUND

A solid-state drive (SSD) generally has faster performance, is morecompact, and is less sensitive to vibration or physical shock than aconventional magnetic disk drive. Given these advantages, SSDs are beingused in more and more computing devices and other consumer products inlieu of or in addition to magnetic disk drives, even though thecost-per-gigabyte storage capacity of SSDs is significantly higher thanthat of magnetic disk drives.

SSDs utilize physical memory cells that comprise nonvolatilesemiconductor storage devices, such as NAND devices, to store data. Asystem-on-chip (SoC) controller is used in an SSD to manage the transferof data between a host and the memory cells of the SSD. Writing data toand reading data from the physical memory cells of SSDs typicallyinvolves shuffling data between various memory cells. SSDs often employbuffers to handle the transfer of data by the SoC when processing a readrequest or a write request from a host. Specifically ring buffers areoften used as they are simple to implement and manage in both SoCcontroller and firmware functions. The ring buffers often occupy a largeamount of memory so they are stored in dynamic random-access memory(DRAM) located external to the controller.

Ring buffers in external DRAM require significant memory bus bandwidthusage to cope with high data transfer rates. Additionally, transferringlarge amounts of data, as are often involved in read and write requestfrom a host, to an external DRAM would increase the power consumption ofthe SSD. There is therefore a long felt need optimized data transfersbetween a host and an SSD which reduce power consumption and reducelatency.

SUMMARY

According to an embodiment of the present disclosure there is provided asolid-state drive (SSD) comprising a plurality of nonvolatilesemiconductor storage devices. The SSD also comprises an integratedcircuit comprising a host interface configured to communicatively couplethe integrated circuit to a host, a controller, and a device interfaceconfigured to communicatively couple the integrated circuit to theplurality of nonvolatile semiconductor storage devices. Additionally,the SSD comprises an internal buffer forming a part of the integratedcircuit, and an external buffer communicatively coupled to thecontroller, the external buffer being external to the integratedcircuit. The host interface is configured to receive a read request fordata from the host, wherein the data subject to the read request issegmented into a plurality of data units. The device interface isconfigured to determine one or more target nonvolatile semiconductorstorage devices of the plurality of nonvolatile semiconductor storagedevices in which the data subject to the read request are stored. Thecontroller is configured to determine, for each data unit of theplurality of data units, the availability of the internal buffer totemporarily accumulate the data unit, wherein (i) if the internal bufferis available, accumulate the data unit from the one or more targetnonvolatile semiconductor storage devices to the internal buffer, and(ii) if the internal buffer is not available, accumulate the data unitfrom the one or more target nonvolatile semiconductor storage devices tothe external buffer. The host interface is further configured totransfer accumulated data corresponding to the data subject to the readrequest to the host.

In some implementations, the controller is further configured to removefrom the internal buffer and the external buffer the data units thathave been transferred to the host. In certain implementations, each ofthe internal buffer and the external buffer comprises a plurality ofwrite buffers and a plurality of read buffers. In furtherimplementations, each of the read and write buffers comprise ringbuffers. In some implementations, there are more read buffers than writebuffers. In certain implementations, the SSD further comprises aprogrammable firmware configuration circuit coupled to the memorycontroller that is configured to set a number of read buffers and anumber of write buffers in the internal buffer.

In further implementations, the internal buffer resides in a localmemory associated with the controller. In some implementations, thelocal memory comprises static random-access memory (SRAM). In certainimplementations, the external buffer resides in a memory external to theintegrated circuit. In further implementations, the external memorycomprises dynamic random-access memory (DRAM). In some implementations,each nonvolatile semiconductor storage device comprises a NAND chip. Incertain implementations, the controller comprises a system-on-chip (SoC)controller.

According to another embodiment of the present disclosure there isprovided an SSD comprising a plurality of nonvolatile semiconductorstorage devices. The SSD also comprises an integrated circuit comprisinga host interface configured to communicatively couple the integratedcircuit to a host, a controller, and a device interface configured tocommunicatively couple the integrated circuit to the plurality ofnonvolatile semiconductor storage devices. Additionally, the SSDcomprises an internal buffer forming part of the integrated circuit, andan external buffer communicatively coupled to the controller, theexternal buffer being external to the integrated circuit. The hostinterface is configured to receive a write request containing data froma host and target nonvolatile semiconductor storage devices of theplurality of nonvolatile semiconductor storage devices in which the datais to be written, wherein the data subject to the write request issegmented into a plurality of data units. The controller is configuredto determine, for each data unit of the plurality of data units, theavailability of the internal buffer to temporarily accumulate the dataunit, wherein (i) if the internal buffer is available, accumulate thedata unit from the host to the internal buffer, and (ii) if the internalbuffer is not available, accumulate the data unit from the host to theexternal buffer. The device interface is configured to determine whenthe target nonvolatile semiconductor storage devices are ready to bewritten, and transfer accumulated data corresponding to the data subjectto the write request to the target nonvolatile semiconductor storagedevices when ready.

In certain implementations, the controller is configured to remove fromthe internal buffer the data units that have been transferred to thetarget nonvolatile semiconductor storage devices. In someimplementations, the controller is configured to store a backup copy ofthe data units accumulated in the internal buffer, in the externalbuffer. The further implementations, the controller is configured toremove the backup copy of the data units in the external buffer once theaccumulated data units are programmed into the target nonvolatilesemiconductor storage devices. In certain implementations, thecontroller is configured to send a message to the host to indicatecompletion of the write request after all data units are temporarilyaccumulated in the internal buffer and the external buffer.

In some implementations, the device interface is configured to transferthe accumulated data units to the target nonvolatile semiconductorstorage devices as and when the target storage devices become ready. Infurther implementations, the device interface is configured to programeach nonvolatile semiconductor storage device in order to ready them forreceiving the data units prior to transferring the accumulated dataunits. In certain implementations, the controller is configured totransfer the data units in the external buffer instead of the dataaccumulated in the internal buffer to the target nonvolatilesemiconductor storage devices in the event of a power loss or a programfailure in at least one of the nonvolatile semiconductor storagedevices.

In some implementations, each of the internal buffer and the externalbuffer comprises a plurality of write buffers and a plurality of readbuffers. In certain implementations, each of the read and write bufferscomprise ring buffers. In further implementations, there are more readbuffers than write buffers. In some implementations, the SSD furthercomprises a programmable firmware configuration circuit coupled to thememory controller that is configured to set a number of read buffers anda number of write buffers in the internal buffer.

In certain implementations, the internal buffer resides in a localmemory associated with the controller. In further implementations, thelocal memory comprises static random-access memory (SRAM). In someimplementations, the external buffer resides in a memory external to theintegrated circuit. In certain implementations, the external memorycomprises dynamic random-access memory (DRAM). In furtherimplementations, each nonvolatile semiconductor storage device comprisesa NAND chip. In some implementations, the controller comprises asystem-on-chip (SoC) controller.

According to another embodiment of the present disclosure there isprovided a method performed by a controller of an integrated circuit.The method comprises receiving a read request for data from a hostinterface connected to a host, wherein the data subject to the readrequest is segmented into a plurality of data units. The method alsocomprises receiving, from a device interface connected to a plurality ofnonvolatile semiconductor storage devices contained in an SSD, one ormore target nonvolatile semiconductor storage devices of the pluralityof nonvolatile semiconductor storage devices in which the data subjectto the read request are stored. The method further comprisesdetermining, for each data unit of the plurality of data units, theavailability of an internal buffer of the controller to temporarilyaccumulate the data unit, wherein the internal buffer and the controllerform part of the integrated circuit, and wherein (i) if the internalbuffer is available, accumulating the data unit from the one or moretarget nonvolatile semiconductor storage devices to the internal buffer,and (ii) if the internal buffer is not available, accumulating the dataunit from the one or more target nonvolatile semiconductor storagedevices to an external buffer communicatively coupled to the controller,wherein the external buffer is external to the integrated circuit. Themethod also comprises transferring the accumulated data corresponding tothe data subject to the read request to the host interface for deliveryto the host.

In some implementations, the method further comprises removing, by thecontroller, the accumulated data units from the internal buffer and theexternal buffer. In certain implementations, the method also comprisesprogramming, by a firmware configuration circuit coupled to thecontroller, a number of read buffers and a number of write buffers inthe internal buffer, wherein, preferably, the number of read buffersexceeds the number of write buffers.

According to another embodiment of the present disclosure there isprovided a method performed by a controller of an integrated circuit.The method comprises receiving a write request from a host interfaceconnected to a host, the write request containing data from the host andtarget nonvolatile semiconductor storage devices of a plurality ofnonvolatile semiconductor storage devices contained in an SSD in whichthe data is to be written, wherein the data subject to the write requestis segmented into a plurality of data units. The method furthercomprises determining, for each data unit, the availability of aninternal buffer of the controller to temporarily accumulate the dataunit, wherein the internal buffer and the controller form part of theintegrated circuit, and wherein (i) if the internal buffer is available,accumulating the data unit from the host to the internal buffer, and(ii) if the internal buffer is not available, accumulating the data unitfrom the host to an external buffer communicatively coupled to thecontroller, wherein the external buffer is external to the integratedcircuit. The method also comprises storing a backup copy of the dataunits accumulated in the internal buffer, in the external buffer.

In some implementations, the method further comprises removing, by thecontroller, the accumulated data units from the internal buffer and theexternal buffer. In certain implementations, the method also comprisesremoving, by the controller, the backup copy of the data units in theexternal buffer once the data units are programmed into the targetnonvolatile semiconductor storage devices. In further implementations,the method comprises sending, by the controller, a message to the hostto indicate completion of the write request after the data units aretemporarily accumulated in the internal buffer and the external buffer.

In certain implementations, the method further comprises transferring,by the controller, the data units in the external buffer instead of thedata accumulated in the internal buffer to the target nonvolatilesemiconductor storage devices in the event of a power loss or a programfailure in at least one of the nonvolatile semiconductor storagedevices. In further implementations, the method further comprisesprogramming, by a firmware configuration circuit coupled to thecontroller, a number of read buffers and a number of write buffers inthe internal buffer, wherein, preferably, the number of read buffersexceeds the number of write buffers.

According to another embodiment of the present disclosure there isprovided a non-transitory computer-readable medium storing instructionsthat, when executed by a processor, causes a controller of an integratedcircuit of an SSD to perform a method. The method comprises receiving aread request for data from a host interface connected to a host, whereinthe data subject to the read request is segmented into a plurality ofdata units. The method further comprises receiving from a deviceinterface connected to a plurality of nonvolatile semiconductor storagedevices one or more target nonvolatile semiconductor storage devices ofthe plurality of nonvolatile semiconductor storage devices in which thedata subject to the read request are stored. The method also comprisesdetermining, for each data unit of the plurality of data units, theavailability of an internal buffer of the controller to temporarilyaccumulate the data unit, wherein the internal buffer and the controllerform part of the integrated circuit, and wherein (i) if the internalbuffer is available, accumulating the data unit from the one or moretarget nonvolatile semiconductor storage devices to the internal buffer,and (ii) if the internal buffer is not available, accumulating the dataunit from the one or more target nonvolatile semiconductor storagedevices to an external buffer communicatively coupled to the controller,wherein the external buffer is external to the integrated circuit. Themethod further comprises transferring the accumulated data correspondingto the data subject to the read request to the host interface fordelivery to the host.

According to another embodiment of the present disclosure there isprovided a non-transitory computer-readable medium storing instructionsthat, when executed by a processor, causes a controller of an integratedcircuit of an SSD to perform a method. The method comprises receiving awrite request from a host interface connected to a host, the writerequest containing data from the host and target nonvolatilesemiconductor storage devices of a plurality of nonvolatilesemiconductor storage devices contained in the SSD in which the data isto be written, wherein the data subject to the write request issegmented into a plurality of data units. The method also comprisesdetermining, for each data unit, the availability of an internal bufferof the SoC controller to temporarily accumulate the data unit, whereinthe internal buffer and the controller form part of the integratedcircuit, and wherein (i) if the internal buffer is available,accumulating the data unit from the host to the internal buffer, and(ii) if the internal buffer is not available, accumulating the data unitfrom the host to an external buffer communicatively coupled to thecontroller, wherein the external buffer is external to the integratedcircuit. The method further comprises storing a backup copy of the dataunits accumulated in the internal buffer, in the external buffer.

BRIEF DESCRIPTION OF DRAWINGS

The foregoing and other objects and advantages will be apparent uponconsideration of the following detailed description, taken inconjunction with the accompanying drawings, in which like referencecharacters refer to like parts throughout, and in which:

FIG. 1 shows a schematic representation of a solid-state drive (SSD),configured according to one or more embodiments of the presentdisclosure;

FIG. 2 is a flow diagram of method steps for processing a read requestfrom a host according to one or more embodiments of the presentdisclosure; and

FIG. 3 is a flow diagram of method steps for processing a write requestfrom a host, according to one or more embodiments of the presentdisclosure.

DETAILED DESCRIPTION

To provide an overall understanding of the devices described herein,certain illustrative embodiments will be described. Although theembodiments and features described herein are specifically described foruse in connection with a solid-state drive (SSD) having a controller, itwill be understood that all the components and other features outlinedbelow may be combined with one another in any suitable manner and may beadapted and applied to other types of SSD architectures requiringtransparent buffer optimization.

FIG. 1 is a block diagram of a computing system 100 comprising at leastone host 110 in communication with a storage device 120. The host 110 isa computing system that comprises processors, memory, and othercomponents as is generally known in the art, and which is not shown inFIG. 1 for the sake of brevity. Storage device 120 provides nonvolatilestorage functionality for use by the host 110. Storage device 120 is aSSD, which is a nonvolatile storage device that may include anintegrated circuit comprising a controller. Such an integrated circuitmay also be referred to as a system-on-chip (SoC) 130. SoCs areadvantageous in SSDs as they provide a single integrated circuit thatcontains all the required circuitry and components of the electronicsystem required for the SSD to function. The SoC therefore eliminatesthe need for modular architecture connected by a plurality of busses andbuffers. SoC 130 is communicatively coupled to nonvolatilesemiconductor-based storage elements 140 (such as NAND-based flashmemory devices) as the storage medium. The storage medium may comprise aplurality of NAND chips, such as, for example, 32, 64, 128, 256 separateNAND chips, and each NAND chip can be running separate commands onindividual dies (not shown) within the chip. As an example, the storageelement 140 comprising N NAND chips each with d dies may be running upto (N×d) NAND commands at any one time.

SSD 120 also includes a memory external to the SoC 130, such as adynamic random access memory (“DRAM”) 150. SoC 130 comprises a hostinterface 132 which enables communication with the host 110 for thereceipt of read and write requests, for example. SoC 130 also includes aNAND interface 134 for communication with the storage elements 140(through a plurality of channels such as NAND channels 1 to n as shownin FIG. 1 ), and a DRAM interface 136 for communication with the memory150 external to the SoC. Interface 132, on the SoC 130 may comprise aSerial Advanced Technology Attachment (SATA) connector or a NVMe™connector (NVMe™ is an acronym for “NVM express,” where “NVM” stands for“nonvolatile memory”) operating with a PCIe™ (“Peripheral ComponentInterface Express”) bus, for example. Interface 134 may comprise an OpenNAND Flash Interface (ONFI) or a manufacturer's proprietary interface,for example. Interface 136 may comprise, for example, an interfaceaccording to, but not limited to: a Double Data Rate (DDR) memory busstandard such as DDR3, DDR4 or DDR5; a Low Power Double Data rate(LPDDR) memory bus standard such as LPDDR3, LPDDR4 or LPDDR5; a HybridMemory Cube (HMC) memory bus standard.

DRAM 150 comprises several buffers used to buffer data during read andwrite operations between the host 110 and the storage elements 140. Alsoshown in FIG. 1 is a memory controller 160 that enables the SoC 130 toperform various functions that facilitate the processing of read andwrite requests from the host 110. The SoC also includes a firmwareconfiguration circuit 165 that is programmable. This allows theoperation of the memory controller 160 to be adapted as needed. Forexample, the configuration circuit 165 may be programmed to allocate acertain minimum number of DRAM buffers as read buffers, and theremaining buffers as write buffers, so that the size of the read andwrite buffers may be adjusted according to, for example, the I/Oworkload from the host 110 being sent to the SSD 120. The SoC 130 alsoincludes a local memory 170, such as a static random access memory(SRAM), that is part of the same integrated circuit as the SoC 130. Aswith the DRAM 150, the SRAM 170 also comprises several buffers that maybe utilized by the memory controller 160 during operation. According toembodiments of the present disclosure, the buffers in the SRAM 170 maybe used as read buffers and write buffers. The configuration circuit 165may also be used to allocate a certain minimum number of the SRAMbuffers as read buffers, and the remaining buffers as write buffers.Read and write accesses to the SRAM may complete more quickly thanaccesses to external DRAM and the power (energy) used to complete readand write transfers for SRAM may be lower than external DRAM. Inaddition, DRAM consumes power in order to keep the data stored refreshedin memory where SRAM only consumes a lower static power and is thereforemore power efficient. Therefore, storing data and performing accesses toSRAM may offer advantages of higher speed and lower power than storingand accessing data in external DRAM. However, the size of SRAM isconstrained due to space/area constraints on the SoC die and the highercost of on-chip SRAM over the equivalent amount of external DRAM.

In order to facilitate transfer of data from the host 110 to the NANDdevices 140 in response to a write command, or the transfer of data fromthe NAND devices 140 to the host 110 in response to a read command, theSSD performs some buffering in the local memory 170 that the SoCcontroller 160 maintains. Typically the data will be buffered in a writebuffer and/or a read buffer before reaching its destination. For a largeamount of data, these buffers will be located in a DRAM chip 150external to the SoC 130, as shown in FIG. 1 .

As an example, for each read request received from the host 110, a readcommand will be sent to the SoC 130 via the host interface 132 where itwill be translated by controller 160 into NAND commands (i.e. commandsfor the NAND devices 140). These commands may require the controller 160to access several NAND devices 140 simultaneously. The controller 160will then read data from the NAND devices containing the requested datavia the NAND interface 134, perform error correction where necessary,and transfer the corrected data to a read buffer. Typically, the readbuffer is located in the external DRAM 150. Read commands to differentNAND devices/dies may complete at different times, therefore data isaccumulated in the read buffer as each command completes. When the datain the read buffer is ready, the data in the read buffer of the DRAM 150will be transferred to the host 110. In some implementations, transferof data from the read buffer may initiate before the whole data bufferis completely ready, when at least a minimum number of contiguous dataunits from the start of the data buffer are complete, with theexpectation that the data buffer will be complete, or nearly complete,by the time the data transfer of the number of contiguous units hasfinished. The minimum number may, for example, consist of 50%, 60% or75% of the total data units in the data buffer.

Similarly, for each write request received from the host 110, a writecommand will be sent to the SoC 130 via the host interface 132, whichwill then be translated by the controller into NAND commands. The writerequest may also contain target NAND devices 140 to which the data is tobe written. Upon receipt of the write request, the controller 160temporarily stores the all the data in a write buffer of the externalDRAM 150, after which an acknowledgement message is sent to the host110. The controller 160 then determines if the target NAND devices 140are available to store the data. When the target NAND devices 140 areready, the data in the write buffer of the external DRAM 150 will bewritten to the target NAND devices 140.

Additionally, the SoC 130 maintains a flash transition table thatconverts addresses that the host 110 uses to store data. The flashtransition table translates the addresses specified by the host 110 in aread or write request into actual physical NAND addresses. These NANDaddresses are the addresses of the target NAND devices 140. Further, thesize of the DRAM 150 containing the read and write buffers is normallydefined by the flash transition table. However, as the size of thememory provided by the NAND device 140 increases, the size of the flashtransition table itself becomes large, and may be in the order ofgigabytes of data. This may be too large to store within the SoC 130,and so in some cases the flash transition table and the read and writebuffers are stored within the DRAM 150. However storing large flashtransition tables in DRAM with wide read and write buffers increases thecost of the DRAM. The power consumption of such large buffers alsoincreases which eats into the power budget of the SSD 120.

In order to improve performance for large data transfers, the DRAM 150maintains ring buffers, a portion of the buffers used as read buffersand the remaining buffers used as write buffers, as depicted in FIG. 1 .The ring buffers simplify the addressing scheme of the read and writedata buffers. Instead of the conventional addresses that go from zero toa maximum value, which then has to wrap around as more data istransferred, the addresses maintained by the ring buffers are referencedby a head pointer and a tail pointer that are simply adjusted accordingto the size of data in a read or write request.

As data transfer rates increase (as is currently the case for SATA andNVMe devices), reading and writing large amounts of data to and from theexternal DRAM 150 across the DRAM interface 136 results in high powerconsumption and increased latency of data transfer. Currently the datatransfer rates are as high as 7 GB per second, and may be bidirectional.This means that each of the read buffer and the write buffer willrequire a DRAM 150 that is running at over twice the rate of the host110. This is because, during a read operation data is transferred at arate of 7 GB per second when being transferred from the NAND devices 140to the DRAM 150, for example, and the data is then sent from the DRAM150 to the host 110 at 7 GB per second. Each time the DRAM 150 isaccessed for the transfer of data, power is consumed by both the DRAM150 and the bus drivers of the DRAM interface 136. Often the SSD 120 isoften limited with the amount of power it can draw. Additionally, theSSD may be in receipt of parallel read/write requests from the host 110at any one time. Thus data transfers to and from the DRAM 150 will havea noticeable impact on the power consumed by the SSD 120. SSDs 120 withlimited power budgets having increased data transfers involving the DRAM150 may limit or throttle the rate of data transfers to keep within apower budget and will therefore suffer from lower performance due to theincreased power consumption by the DRAM 150 during such data transfersas the SSD 120 is power limited.

Further, in the case of a read request, for example, the data read froma target NAND device 140 may contain more errors than can be easilydecoded by the controller 160. In such cases, additional data from theNAND devices 140 may be necessary for error correction, for example byperforming re-reads or RAID reconstruction. Such error correction mayresult in a read request that takes a longer time than that for datawhich contains less errors. This results in different latencies of thedata retrieved from the NAND devices 140. Additionally, some NANDdevices 140 may be processing other requests received prior to thecurrent request. As such, the controller 160 may have to wait for theearlier requests at a target NAND device to be completed beforeattending to the current request, which also results in a varyingretrieval times from a target NAND device, thereby increasing datalatency.

To cope with the different latencies of the data read from the NANDdevices 140, the read and write buffers in the DRAM 150 are used toaccumulate the requested data. Once all the requested data has beenretrieved from the target NAND devices and accumulated in the readbuffer of the DRAM 150, the accumulated data is pushed to the host 110.Due to the number of requests in flight, the read buffers and the writebuffers in the DRAM 150 may have to be large, e.g. 50 MB, so as toaccumulate all the data associated with the requests. This increasespower consumption of the SSD.

According to an embodiment of the present disclosure, there is provideda method of servicing read requests and write requests from the host 110by enhancing the function of the ring buffers in the DRAM 150. This isdone by using the internal SRAM 170 of the SoC 130 as read and writebuffers in the first instance, instead of relying solely on the buffersof the external DRAM 150. The SRAM 170 uses a significantly less powercompared to the DRAM 150 during data transfers. By using the internalbuffers of the SRAM 170 without having to solely rely on the externalDRAM 150, the power consumed by the SSD 120 would be significantlyreduced. Further, on the fly data transfers between the SRAM 170 and theNAND devices 140 reduces the latency in read and write requests whenmultiple operations are being executed.

As previously mentioned, due to the simplicity of implementation, theread and write buffers in the SRAM 170 may also be configured as ringbuffers. Here requests from the host 110 are handled by the controller160 which determines whether the data to be transferred can beaccumulated in the SRAM 170 buffers, or if a portion of the data needsto be additionally accumulated in the DRAM 150. The requests are handledtransparently, which is to say that the control logic and/or processorswithin the host interface 132 or NAND interface 134 which arecontrolling the transfers are unaware whether the data is stored in theSRAM 170 or the DRAM 150. Once the data has been successfullyaccumulated, the controller 160 pushes the data either to the host 110in the case of a read request, or to the NAND devices 140 in the case ofa write request.

In this manner, the memory controller 160 effectively diverts data intothe local SRAM 170 and any subsequent accesses. When the host 110 or thetarget NAND devices 140 are ready, the controller 160 fetches the dataaccumulated in the local SRAM 170 transparently. Here the controller 160manages movement of the data between the host 110 and the target NANDdevices 140 internally within the SRAM 170 in order to fulfil a read orwrite request. The controller 160 therefore is the only part of thesystem that realizes that the data has been accumulated locally withinthe SRAM 170. All other systems within the SoC 130 and within the SSD120 assume that the data has been entirely stored in the external DRAM150 prior to being transferred to the host 110 (in the case of a readrequest) or the target NAND devices (in the case of a write request).

When transferring the data, the memory controller 160 splits the datainto defined units. Typically each defined unit would be 4,096 bytes(approximately 4 KB), for example, however the scope of the presentdisclosure includes data units of any other size. This means that thelocal SRAM 170 will store data in individual blocks of 4 KB, i.e. theSRAM 170 has data slots of 4 KB each for accumulating data. Thecontroller 160 maintains a tally of the number of available data slotsin the SRAM 170, known as the SRAM slot machine (SSM). By checking withthe SSM, the controller 160 decides on the fly whether the local SRAMread/write buffer is able to accumulate the data unit of a particularblock of data. If the SRAM 170 does not have any available slots (i.e.the SRAM does not have available space), the controller 160 redirectsthe data unit to the read/write buffer in the DRAM 150.

The availability of the buffers in the local SRAM 170 to store dataunits relating to a read or write request is dependent on theconfiguration of the SRAM 170. Such configuration of the SRAM 170 isdependent operating parameters programmed into the firmwareconfiguration circuit 165 coupled to the controller 160. The operatingparameters may be stored in the firmware configuration circuit 165 bysoftware written code, for example. Exemplary operating parameters mayinclude the number of buffers to be allocated as read buffers to handleread requests and the number of buffers to be allocated as write buffersto handle write requests.

Generally, the read requests are of greater importance that the writerequests. This is because the data required by a read request needs tobe supplied to the host 110 as quickly as possible because without suchdata, the host 110 cannot continue with its operations. Thus the threadof execution of the host 110 waits for the data in a read request untilit is made available by the SSD 120 and cannot proceed until the data isobtained. Thus the SSD 120 will prioritize read requests over writerequests, and it does so by programming the SRAM 170 via the firmwareconfiguration circuit 165 such that read requests can consume more ofthe internal buffers of the SRAM 170 than the write requests. In someinstances, write requests are able to fill up the internal buffer of theSRAM 170 but the controller 160 will have to leave a certain amount ofspace in the SRAM 170 buffers which is dedicated only for read commands.

During operation, the SSD 120 will process multiple commands inparallel, such as, for example, a mixture of read requests and writerequests in its submission queue. There is a significant differencebetween the time it takes for a read request to be fulfilled (of theorder of tens of microseconds) by the NAND devices 140 and the time ittakes for a write request to be fulfilled (of the order of milliseconds)by the NAND devices 140, the read requests being fulfilled faster. Thisis because when a read request is received, the controller 160 willimmediately begin processing the read request by fetching the requesteddata from the NAND devices 140. As soon as the controller 160 fetchesthe data for the read request, the controller begins to process the nextrequest in the submission queue which may be another read request or awrite request, regardless of whether the requested data for the firstread request has been returned.

In contrast, write operations involve multiple steps that includebuffering and acknowledgement back to the host 110 as soon as the dataunits reach the internal buffer of the SRAM 170. Thus there is no urgentneed to get all the data units from the SRAM 170 immediately to thetarget NAND devices 140 as the data is already accumulated within theSSD 120; the transfer of data units from the write buffer of the SRAM170 to the target NAND devices 140 can occur at any time after anacknowledgement is sent to the host 110. However for read requests,there is an urgent need to fetch the requested data units from therespective NAND devices 140, in a parallel fetch operation, andaccumulate the data in the read buffers of the SRAM 170, as the host 110cannot proceed without the requested data. Only after all the data unitshave been fetched can the requested data be assembled in the read bufferof the SRAM 170 and returned to the host 110 so that the host 110 canproceed with other tasks using the read data.

FIG. 2 illustrates an exemplary flow diagram of a method 200 forprocessing a read request from the host 110 according to an embodimentof the present disclosure. The method begins at step 210 where a readrequest is received from the host 110 via the host interface 132 of theSoC 130. In step 220, the read request is processed by flash translationlayer firmware running on the SoC 130 where the firmware locates thespecific page and NAND chip from the NAND devices 140 in which therequested data is stored. Firmware running on the SoC 130 then sends acommand to the NAND devices 140 to retrieve the requested data andtransfer it to DRAM 150 via the memory controller 160. As previouslymentioned, this retrieval of read data is quick and usually spans tensof microseconds. The controller 160 also determines the size of the dataand the number of data units it occupies in view of the SSM of theinternal SRAM 170.

In step 230, the controller 160 determines if an internal read buffer inthe SRAM 170 is available for a data unit of the requested data. Ifthere is an available slot in the read buffer of the SRAM 170 to holdthe data unit, i.e. ‘Y’ at step 230 in FIG. 2 , the data unit istransparently transferred from the relevant NAND device 140 to the readbuffer of the SRAM 170 in step 240. No data is transferred to the DRAM150. If there are no available slots in the read buffer of the SRAM 170to store the data unit, i.e. ‘N’ at step 230 in FIG. 2 , the data unitis transferred to the read buffer of the DRAM 150, as indicated in step250. The controller then determines if there are more data units fromthe requested data to be transferred from the NAND device 140 in step260. If there are more data units to be transferred out from the NANDdevice 140, i.e. ‘Y’ at step 260, the method returns to step 230 asdescribed above.

If there are no more data units to be transferred out from the NANDdevices 140 and all the requested data units have been accumulated inthe internal read buffer of the SRAM 170 and the external DRAM 150(where applicable), i.e. ‘N’ at step 260, the data is assembled andpushed to the host 110 in step 270, where the firmware of the SoC 130sets up a data transfer between the read buffer in DRAM 150 and the host110. The read data units in the SRAM 170 and the external DRAM 150, maythen be cleared. It should be noted that in accordance with embodimentsof the present disclosure, the read buffer of the SRAM 170 rarely runsout of slots for the read data, and so it is unlikely that the DRAM 150would be utilized in a read operation. By relying on the internal SRAM170 in this manner, the SSD 120 would not require a DRAM 150 with highbandwidth and power consumption.

As an example of a read request according to the method 200 of thepresent disclosure, assume that host 110 requires data units A-E fromthe NAND devices 140. Each of the data units A-E fits one slot in theSSM. The firmware of the SoC 130 sets up a data transfer of the dataunits A-E from the NAND devices 140 to the DRAM 150 via the controller160. Once each data unit is retrieved from the relevant NAND devices 140and transferred to the controller 160, the controller 160 determines ifthe read buffer of the SRAM 170 has an available slot for the respectivedata unit it is required to store. Thus, when data unit A is retrieved,the controller 160 receives a request to store the data unit A anddetermines if the read buffer of the SRAM 170 has an available slot fordata unit A. If it does, data unit A is transparently transferred fromthe NAND devices 140 to the read buffer of the SRAM 170. The controllerthen receives requests to store the remaining data units. In thisexample data units B-E still have to be retrieved. The controller 160then repeats step 230 for each of the remaining data units B-E as theyare retrieved from the NAND devices 140. If the controller 160determines that the read buffer of the SRAM 170 does not have anyavailable slots for a particular data unit, say data unit C for example,the controller stores data unit C from the relevant NAND device 140 tothe read buffer of the external DRAM 150.

In this example, data units D-E remain, and so the controller repeatsstep 230 for each of data units D-E as they are retrieved from the NANDdevices 140. Thus, when receiving data unit D for storage, thecontroller once again determines if the read buffer of the SRAM 170 hasan available slot for data unit D. If it does, data unit D istransparently transferred from the NAND devices 140 to the read bufferof the SRAM 170. Note that the read buffer of the internal SRAM 170 mayhave freed up due to the completion of other parallel requests in thetime between storage of data unit C in the read buffer of the DRAM 150and retrieval of data unit D from the NAND devices 140. The controllerthen repeats step 230 for data unit E. According to embodiments of thepresent disclosure, the SRAM is preferred for the accumulation of readdata units as it consumes less power than the DRAM. Thus in thisexample, data units A, B, D and E are accumulated transparently in theread buffer of the SRAM 170 while data unit C is accumulated in the readbuffer of the DRAM 150. After all of the data units A-E have beenaccumulated, the firmware of the SoC 130 sets up a data transfer fromDRAM 150 to the host 110. The controller 160 transparently reads thedata units A, B, D and E from the SRAM 170 and data unit C from DRAM150, after which the read data units A-E in the SRAM 170 and DRAM 150are cleared.

FIG. 3 illustrates an exemplary flow diagram of a method 300 forprocessing a write request from the host 110 according to an embodimentof the present disclosure. The method begins at step 310 where a writerequest is received from the host 110 via the host interface 132 of theSoC 130. The write request comprises data and target NAND devices 140 towhich the data is to be written. The write request is processed by thefirmware of the SoC 130 which sets up a data transfer between the host110 and the DRAM 150. The controller 160 receives store commands for thedata and determines the size of the data and the number of data units itoccupies in view of the SSM of the internal SRAM 170.

In step 320, the controller 160 determines if an internal write bufferin the SRAM 170 is available for a data unit of the write data. If thereis an available slot in the write buffer of the SRAM 170 to temporarilystore the data unit, i.e. ‘Y’ at step 320 in FIG. 3 , the data unit istransparently transferred from the host 110 to the write buffer of theSRAM 170 in step 330. In some implementations, in step 330 the data unitis also transferred to the write buffer of the DRAM 150, which allowsthe space occupied by the data unit in SRAM 170 to be freed immediatelyafter the data unit is transferred to the NAND device in step 380, whichin turn more quickly provides space for subsequent data unitstransferred from the host 110. This allows the data unit in DRAM 150 toserve as a backup in the rare event that the program operation to theNAND device fails, which may occur several milliseconds after the dataunit is transferred to the NAND device. In this way, the DRAM 150, whichis still being used to store data units, is rarely, or much lessfrequently, called upon to have data units read (if there was no spacein the local write buffer in the SRAM 170 for the transfer from the hostor if the program operation to the NAND device fails, then the data unitmust be read or re-read from the DRAM 150) thereby effectively reducingthe memory bus bandwidth by almost 50%. If there are no available slotsin the write buffer of the SRAM 170 to store the data unit, i.e. ‘N’ atstep 320 in FIG. 3 , the data unit is transferred to the write buffer ofthe DRAM 150, as indicated in step 340. The controller then determinesif there are more data units from the write data to be transferred fromthe host 110 in step 350. If there are more data units to be transferredin from the host 110, i.e. ‘Y’ at step 350, the method returns to step320 as described above.

If there are no more data units to be transferred in from the host 110and all the write data units have been accumulated in the internal readbuffer of the SRAM 170, and the external DRAM 150 (where applicable),i.e. ‘N’ at step 350, an acknowledgement message is sent to the host 110in step 360 to indicate that the write request and associated data hasbeen received by the SSD 120. It should be noted that at this point thewrite data has not been written to the NAND devices 140. Rather the dataunits have only been transparently transferred from the host 110 to theinternal write buffer of the SRAM 170, and the external DRAM 150 (whereapplicable). The NAND interface 134 then sets up a data transfer betweenthe DRAM 150 for the data units (via the memory controller 160, which,for each data unit, retrieves the data unit from SRAM 170 if the dataunit was stored there by the controller 160 rather than the DRAM 15) anddetermines if the target NAND devices 140 are ready for receiving theaccumulated write data in step 370. The NAND devices 140 may not all beready at the same time to receive the accumulated write data as they maybe involved in other read/write request which may not have completedyet. The determination in step 370 is therefore done on the fly and assoon as one of the target NAND devices 140 is ready, i.e. ‘Y’ at step370, one or more data units, depending on the number of data units perNAND device flash page, from the accumulated write data is pushed to thetarget NAND device 140. In some examples the write data is typicallysent to the NAND devices in a 96 KB DMA transfer (as a scatter gatherlist of 4 KB data units) for a NAND device flash page size of 96 KB. Aseach 4 KB data unit is transferred to the NAND device, the correspondingdata slot occupied by the data unit in the SRAM 170 may be freed (inother words, it need not be necessary to wait until the whole 96 KB NANDdevice flash page is transferred before individual 4 KB data unit slotsin the SRAM 170 may be freed). This is done repeatedly until all theaccumulated data units are pushed to the respective target NAND devices140. If none of the NAND devices are ready, i.e. ‘N’ at step 370, theNAND interface 134 waits until a target NAND device 140 becomesavailable. Once the data units are transferred to the NAND devices, thewrite data units in the SRAM 170 may be freed. Once the data units whichhave been transferred are successfully programmed into the NAND devices,which may occur several milliseconds later, the data units in DRAM 150may be cleared.

In the event of a power loss to the SSD 120, the data units in the SRAM170 may already have been freed before the data unit has beensuccessfully programmed into the NAND memory device. To prevent thewrite data from being permanently lost during a power loss event, thedata units accumulated in the internal write buffer of the SRAM 170 maybe simultaneously accumulated in the write buffer of DRAM 150, such thatthe data units are duplicated or backed up in the DRAM 150. In thismanner, after a power loss event and while the SSD 120 is operatingunder backup power from a secondary power source such as batteries orsuper capacitors, the write data units that have been backed up in theDRAM 150 are pushed into the target NAND devices 140 therebysuccessfully writing the data received from the host 110 to the NANDdevices 140. As noted previously, data units duplicated or backed up inthe DRAM 150, may also be transferred for a second time into the NANDdevice 140 in the event that the data units transferred failed to beprogrammed correctly and the data unit in the SRAM 170 had already beenfreed.

As an example of a write request according to the method 300 of thepresent disclosure, assume that host 110 requests that data units A-E bewritten to target NAND devices 140. Each of the data units A-E fits oneslot in the SSM. Once the data units A-E are presented to the SSD 120 bythe host 110, the controller determines if the write buffer of theinternal SRAM 170 is available for at least one of the data units A-E.If the SSM indicates that there are two slots available, write dataunits A-B are transparently pushed into the write buffer of the SRAM170. Note that data units A-E may be transferred in any order and neednot be alphabetical as exemplified here. The controller then determinesif there are more data units to be written. In this example data unitsC-E still have to be written. The controller 160 then repeats step 320for each of the remaining data units C-E as they are received from thehost 110. After all of the data units A-E have been accumulated in theSRAM 170, an acknowledgement message is sent to the host 110 to indicatethat the write request and associated data has been received by the SSD120. The controller may also simultaneously store the data units A-E inthe DRAM 150 as a backup to guard against power loss and to enable thefreeing up of space in the write buffer of SRAM 170 once data units havebeen transferred to, but not yet programmed into, NAND devices 140. TheNAND interface 134 then determines on the fly whether the target NANDdevices 140 are ready. As each target NAND device 140 becomes ready, thedata units A-E are transferred from the write buffer of the internalSRAM 170 to the respective NAND devices 140. As each data unit istransferred to the NAND devices 140, the write data units in the writebuffer of the SRAM 170 are freed. After each data unit is successfullyprogrammed into NAND device 140, the corresponding write data unit inthe write buffer of the DRAM 150 is freed.

If the controller 160 determines that the read buffer of the SRAM 170does not have any available slots for a particular data unit, say dataunit C for example, the controller transfers data unit C from the host110 to the write buffer of the DRAM 150. The controller then determinesif there are more data units to be received from the host 110. If dataunits D-E remain, the controller repeats step 320 for each of data unitsD-E as they are received from the host 110. Thus, when receiving dataunit D, the controller 160 once again determines if the write buffer ofthe SRAM 170 has an available slot for write data unit D. If it does,data unit D is transparently transferred from the host 110 to the writebuffer of the SRAM 170. Note that the write buffer of the internal SRAM170 may have data unit slots that have been freed up due to thecompletion of other parallel requests in the time between storage ofdata unit C in the write buffer of the DRAM 150 and receipt of data unitD from the host 110. As previously noted, the simultaneous transfer ofdata units to the write buffer of DRAM 150 as they are transferred tothe write buffer of local memory SRAM 170, enables the data units inwrite buffer in SRAM 170 to be freed up as soon as the data unit istransferred to NAND devices 140, without waiting for the programmingoperation of the data unit to NAND devices 140 to be successfullycompleted, which ensures that the data units in the write buffer areoccupied for the minimum amount of time, increasing the likelihood thatincoming data units from the host 110 can be found space to be stored inthe write buffer of the SRAM 170. The controller 160 then repeats step320 for write data unit E. According to embodiments of the presentdisclosure, the SRAM 170 is preferred for the accumulation of read dataunits as it enables faster read and write access times and consumes lesspower than the DRAM 150. Thus in this example, data units A, B, D and Eare accumulated transparently in the write buffer of the SRAM 170 whiledata unit C is accumulated in the write buffer of the DRAM 150.

After all of the data units A-E have been accumulated, anacknowledgement message is sent to the host 110 to indicate that thewrite request and associated data has been received by the SSD 120. Thecontroller may also store a backup of the data units A, B, D and Eaccumulated in the SRAM 170 in the DRAM 150 to guard against power lossand to enable the freeing up of space in the write buffer of SRAM 170once data units have been transferred to, but not yet programmed into,NAND devices 140. The controller 160 then determines on the fly whetherthe target NAND devices 140 are ready. As each target NAND device 140becomes ready, the write data units A, B, D and E are transferred fromthe write buffer of the internal SRAM 170, and the write data unit Cfrom the write buffer of the DRAM 150, to the respective NAND devices140. After each data unit has been transferred to the NAND devices 140,the corresponding write data unit in the SRAM 170 is freed. After eachdata unit is successfully programmed into NAND device 140, thecorresponding write data unit in the write buffer of the DRAM 150 isfreed.

Other objects, advantages and embodiments of the various aspects of thepresent invention will be apparent to those who are skilled in the fieldof the invention and are within the scope of the description and theaccompanying Figures. For example, but without limitation, structural orfunctional elements might be rearranged consistent with the presentinvention. Similarly, principles according to the present inventioncould be applied to other examples, which, even if not specificallydescribed here in detail, would nevertheless be within the scope of thepresent invention.

1. An integrated circuit system comprising: an integrated circuit; amemory controller; an internal buffer internal to the integrated circuitand communicatively coupled to the memory controller; and an externalbuffer external to the integrated circuit and communicatively coupled tothe memory controller, wherein the memory controller is configured toreceive a request to write data to an available buffer, the data beingsegmented into a plurality of data units, determine, for each data unitof the plurality of data units, availability of the internal buffer totemporarily store the data unit, if the internal buffer is available,write the data unit to the internal buffer without writing to theexternal buffer, and if the internal buffer is not available, write thedata unit to the external buffer without writing the data unit to theinternal buffer.